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Introduction 


This  Final  Report  surtmarizes  the  work  by  the  Social  Science  Research 
Institute,  University  of  Southern  California  on  subcontract  P.O.  76-030- 
0715  frcxn  Decisions  and  Designs,  Inc.,  prime  contract  N00014-76-C-0074 
from  the  Advanced  Research  Projects  Agency,  monitored  by  the  Engineering 
Psychology  I'rograms,  Office  of  Naval  Research.  Hie  reseeurch  conducted 
during  this  contract  period  from  October  1,  1976  to  September  30,  1977 
under  the  direction  of  Professor  Ward  Eidwards,  the  Principal  Investigator, 
was  part  of  an  ongoing  program  of  Research  c»i  the  Technology  of  Inference 
and  Decision.  Edwards  (1973,  1975)  and  Edwards  and  Seaver  (1976)  sunnnarized 
previous  researcli. 

Ihe  proposal  leading  to  this  subcontract  called  for  research  on 
five  specific  topics;  measurement  and  validation  of  multiattribute 
utilities,  sensitivity  to  approximation  of  multiattribute  models  and 
assessment  procedures,  group  processes  for  probability  assessment, 
assessing  very  small  probabilities,  and  biases  in  subjective  probability 
distributions  on  ixin-percentage  variables.  Our  researcli  on  these  and 
other  topics  is  reported  in  ten  teclinical  reports  which  liave  been 
produced  or  are  new  being  prepared.  Suttmaries  of  these  technical 
reports  appear  at  the  end  of  this  report. 

Hie  purpose  of  this  report  is  to  explain  licw  this  research  integrates 
into  an  overall  program  of  research  on  decision  technology.  Hius,  we  do 
not  report  in  detail  findings  that  are  set  forth  in  the  self-ccaitained 
technical  reports.  Only  major  findings  are  reviewed  along  with  ongoing 
research  and  future  research  possibilities  suggested  by  our  current  work. 
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II.  A Techniccil  Overview 


Research  at  SSRI  lias  sought  to  determine  the  strengths  vrtaich 
decision  makers  bring  to  the  decision  situation  as  well  as  to  study 
decision  aids  and  techniques  that  improve  relicibility  and  validity  of 
judgments.  Both  theoretical  and  practical  topics  have  been  examined; 
the  line  between  today's  theoretical  research  and  tomorrow's  tool  or 
technique  having  been  deliberately  blurred.  In  the  practical  vein,  we 
have  studied  simplification  techniques  for  construction  of  utility 
models,  aids  to  the  decision  maker  in  the  assessment  of  small  prob- 
abilities, effects  of  various  response  modes  for  the  elicitation  of 
probabilities,  and  ocnparision  of  group  behavioral  techniques  versus 
mathematical  aggregation  models  for  group  probability  assessment.  More 
theoretical  work  has  sought  to  determine  the  individual's  abiJity  to 
deduce  distributions  underlying  the  generation  of  stimuli.  Recently  we 
have  become  interested  in  both  qualitive  and  quantitative  aspects  of 
expertise  as  it  affects  the  judgment  of  uncertainty. 


II  A.  Elicitation  and  Quantification  of  Uncertainty 

II  A.l  Group  assessment  of  uncertainty:  Hunan  interaction  versus 


mathenaticad  models. 

Often  a decision  itaker  is  not  a single  individual  but  rather 
several,  each  of  whom  should  be  able  to  influence  the  final  decision. 

In  decisicMi  analysis  a single  judgment  of  uncertainty  as  well  as  a 
single  judgment  of  value  or  utility  is  necessary  as  input  to  eacli  branch 
of  the  decision  naking  structure.  This  apparent  incorpatibility  has  led 
to  much  research  into  techniques  vdiose  aim  is  to  derive  from  the  group  a 
single  value  for  each  measure  of  uncertainty.  Such  reseeurch  has  explored 
two  major  strategies,  mathematical  techniques  for  the  aggregation  of 
individual  judgments  into  a single  group  estinate,  aixi  behavioral  techniques 
which  seek  group  consensus. 
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Either  approach  has  both  mathanatical  and  social  psychological 
difficulties.  Dalkey  (1972)  heis  shown  that  no  formal  rvile  for  the 
aggregation  of  individual  probabilities  Ccin  satisfy  a set  of  reasonable 
conditions  (such  cis  non-doninanoe  by  a single  group  member) . A ocnpiarcd>le 
proof  exists  for  utility  judgments  (Arrow,  1951) . Behavioral  tech- 
niques liicewise  have  limitations.  Individued  gxxxp  members  may  concern 
themselves  more  with  reaching  ccnsensus  than  with  the  quality  of  the 
agreed-on  judgment.  Vcurious  factors  such  as  individual  dominance  through 
personality  characteristics  or  rank  within  the  organization  may  influence 
judgments  despite  their  irrelevance  to  the  task. 

In  eui  effort  to  compare  various  behavioral  and  mathematical  techniques 
of  group  probability  assessment,  Seaver  (1977,  in  press)  experimentally 
compared  two  aggregation  rules,  weighted  arithmetic  means  and  vieighted 
geometric  means,  and  three  weighting  procedures,  equal  weights,  weights 
based  on  self-rating  and  DeGroot  weights  (DeGroot,  1974) . Five  behaviorcd 
interaction  techniques  were  compared,  the  Delphi  method  (Dalkey  and 
Helmer,  1963)  the  Nominal  Grot?*  'Technique,  developed  by  Delbecq  and  Van 
de  Ven  (1971) , a modified  nominal  group  technique  in  vhich  group  members 
state  their  estimates  and  reasons  with  no  discussion,  a ocaisensus  technique 
in  v^ch  groups  were  to  arrive  at  consensus  in  any  way  they  wished,  and 
a no  interaction  or  control  group  in  which  group  members  made  estimates 
with  no  knowledge  of  other  group  members'  estimates. 

The  quadratic  scoring  rule  was  used  sus  the  criterion  for  measuring 
the  quality  of  group  ctssessments.  The  well-known  insensitivity  of  that 
rule  may  account  for  the  lack  of  significant  differences  amcaig  behavioraLL 
techniques.  In  genereil,  interaction  among  group  members  reduced  differences, 
reduced  the  Ccilibration  of  the  judgments,  and  increased  the  extremeness 
of  judgments.  Therefore,  deciding  vhether  or  not  to  use  group  interaction 
techniques  involves  a tradeoff  between  calibration  and  extremeness  of 
respxanses.  Although  no  significaint  differences  were  found,  slight 
differences  as  well  as  the  results  of  other  studies  point  to  slight 
superiority  of  the  nominal  group  technique  to  other  group  interaction 
methods. 
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Ihe  data  show  that  little  if  anything  is  lost  by  using  nethematicad 
techniques  to  aggregate  individual  judgments  rather  than  behavioral 
interaction.  Considering  the  practical  disadvantages  of  face-to-face 
meetings  of  groups,  no  point  exists  in  bothering  with  the  scmetimes 
lengthy  procedures  of  behavioral  interaction.  While  results  of  this 
experiment  dealt  totally  with  point  estimates,  further  studies  will 
attempt  to  elicit  continuous  distributions.  Results  of  these  studies 
will  become  available  over  the  next  several  months. 


II  A. 2.  Response  scale  effects  on  likelihood  ratio  jud' 


Several  studies,  (Gooditan,  1973;  Phillips  and  Edwards,  1966)  have 
shown  ocxisistent  effects  for  different  methods  of  elicitation  of  subjective 
judgment.  Stillwell,  Seaver  and  Edwards  (1977)  defined  the  effects  due 
to  the  scale  cxi  vdiich  the  subjects  responded  laut  at  the  time  it  was  felt 
that  results  could  at  least  partly  be  due  to  the  extreme  diagnostic  ity 
of  the  data  to  vrfiich  the  subjects  responded.  A d'  of  3.0  wcis  used  to 
generate  data  and  the  range  of  veridical  likelihood  ratios  to  vAiich  the 
s\±)jects  responded  was  such  that  lilcelihood  ratios  as  high  as  12,000:1 
were  encountered.  Although  significantly  better  (closer  to  veridiccil) 
responses  were  found  v^ien  the  subjects  were  responding  to  logarithmically 
spaced  scciles,  it  was  felt  that  a more  moderate  d'  and  range  of  IDoelihood 
ratios  might  reduce  the  nagnitude  of  differences  or  change  the  relationship 
altogether.  A sec»nd  e:q«riment  was  therefore  undertaken  in  vdiich  d* 
and  the  range  of  true  likelihood  ratios  were  vctried. 

Ihe  results  resembled  those  of  the  first  experiment.  Logarithmically 
spaced  scales  were  superior  to  linearly  spaced  sceiles.  The  range  of 
true  likelihood  ratios,  was,  however,  shewn  to  have  a streng  and  significant 
effect  on  performance.  Subjects  were  much  better  able  to  approximate 
veridical  judgments  vhen  less  extreme  true  likelihood  ratios  were  chosen  . 
Ihere  was  eLLso  a significant  interaction  between  end^int  and  spacing 
(logarithmic  versus  linear)  accounting  for  a relatively  large  proportion 
of  the  variance. 


4 


Scale  endpoints  were  shown  to  influence  judgments  consistently. 
Either  of  two  factors  may  be  contributing  to  this  finding.  It  is 
possible  that  the  upper  endpoint  offers  an  upper  bound  to  responses 
thereby  luniting  the  range  of  values  expressed.  A second  possibility 
is  that  the  endpoints  controlled  subjects'  judgments  about  the  range 
in  which  they  could  expect  the  true  valxie  to  fall.  More  extreme  endpoints 
may  thus  produce  more  extreme  responses.  For  a more  detailed  description 
of  the  results  of  this  study,  see  sumary  No.  2. 

II  A. 3.  Averaging  as  a means  of  probabilistic  inference. 

Edwards  and  Seaver  (1976)  discussed  an  experiment  by  Eils,  Seaver 
and  Edwards  (1977)  in  which  averaged  log  li)celihood  ratios  were  elicited 
fron  subjects  and  used  as  inputs  to  a probabilistic  informaticxi  processing 
(PIP)  system.  That  is,  such  log  li)celihood  ratios  judgmentally  averaged 
over  all  data  and  then  processed  by  means  of  Bayes'  theorem  produced  more 
extreme  final  odds  than  posterior  odds  estimated  directly.  In  light  of 
the  general  finding  of  conservatism  in  probability  revision  tas)cs,  this 
would  suggest  that  PIP  outputs  are  more  likely  to  reflect  subjective 
certainty  than  cure  posterior  odds  judgments.  Experiment  I also  shov^ed 
that  persons  using  the  averaged  log  likelihood  ratio  judgements  were  more 
orderly  in  these  judgments  as  evidenced  by  hic^er  correlation  between  true 
final  odds  and  final  odds  calculated  via  Bayes'  theorem. 

A second  experiment  was  undertaken  in  order  to  determine  whether 
the  judgmentally  averaged  log  likelihood  ratio  technique  contributed  signi- 
ficant improvement  over  the  likelihood  ratio  judgment  originally  proposed 
for  the  PIP  system  developed  by  Edwards  et  al  (1968) . Also  a variable 
in  Eiqjeriment  II  was  the  diagnosticity  of  the  data  used  to  elicit  subjects 
responses.  It  was  found  that  data  diagnosticity  affected  quality  of  response 
for  both  response  modes.  Estiiretes  became  more  veridical  as  the  data  became 
more  diagnostic.  The  primary  finding  of  the  study  vras  that  quality  of  esti- 
mates did  not  differ  significantly  in  either  veridicality  or  orderliness 
between  likelihood  ratio  estimates  as  originally  proposed  for  the  PIP 
technique  and  for  the  averaged  log  likelihood  estimates.  Both  methods  ware 
found  to  produce  better  estinates  than  cumulative  certainty  judgment,  as 
is  usiial  in  such  conparisons. 


The  reason  for  considering  an  alternative  to  likelihood  jixinents 
is  that  a prcblem  nay  arise  in  applying  PIP  systems  in  real  world  contexts. 

Tt^e  people  assessinq  the  likelihood  ratios  will  typically  have  access  to 
feedback  about  the  posterior  odds  that  are  calculated  from  their  likelihood 
ratios.  Goodman  (1973) , in  a reanalysis  of  data  from  five  studies  exploring 
methods  of  eliciting  judgments  about  uncertain  events,  ccxicludes  that  feed- 
back about  the  inplications  of  judgments  makes  them  less  extreme  and  is 
prctoably  the  most  powerful  Vciriable  controlling  the  extremeness  of  the 
judgients.  Thus,  even  a PIP  system  may  be  susceptible  to  conservatism  in 
real  world  applications.  This  prctolem  seems  less  likely  to  characterize 
judgnents  of  average  certainty  due  to  the  very  nature  of  the  elicited 
judqments.  Should  further  research  confirm  feedback  produced  conservatism 
in  PIP  systems,  average  certainty  judqments  nay  prove  to  be  a useful 
alternative  to  PIP. 

These  findings  also  conpel  a rethinking  of  the  misaggregation 
explanation  of  conservatism  in  probability  revision.  Mean  log  likelihood 
ratio  is  a judgnentally  aggregated  response — but  it  is  not  conservative 
(nor  yet  radical).  Apparently,  aggregation  that  has  the  character  of  a 
sum  or  product  (i.e.  the  target  number  is  outside  the  range  of  input 
quantitities)  is  conservative.  Aggregation  that  has  the  character  of  an 
average  (the  target  number  lies  near  the  middle  of  the  range  of  input  quantities) 
is  unbiased. 

II.  A. 4 The  assessment  of  small  probabilities. 

In  the  study  of  small  prchability,  hicii  (either  positive  or  negative) 
expected  value  decision  making  situations,  decision  analysis  can  nake  sig- 
nificant ccantributions.  Nuclear  engineering  has  brought  this  situaticai  to 
public  attention  as  system  failures  occur  with  probabilities  typically  smaller 
than  10“^  but  with  values  which  may  exceed  50,000  lives  lost.  But  identical 
kinds  of  problems  arise  frequently  in  military  and  political  contexts.  An 
obvious  exanple  is  whether  or  not  a particular  limited-war  strategy  may  lead 
to  a widening  of  the  war.  Both  experimental  cuid  applied  work  have  shewn,  how- 
ever, that  problems  arise  in  the  si±)jective  assessment  of  the  likelihood  of 
hi^Tly  unlikely  events. 

Unpublished  work  by  Slovic,  Lichtenstein,  Fischoff,  Coombs,  and 
Layman  suggests  a remedy  for  the  snail  probability  assessment  problems. 
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Instead  of  direct  assessment  of  the  probability  of  interest,  Slovic, 
et  al.  asked  subjects  to  judge  which  of  two  events  was  the  most  likely. 

Ihey  found,  with  a few  notable  exceptions,  that  over  eic^ii^  percent  of 
subjects  could  correctly  judge  the  leurger  of  the  probabilities  of  a pair 
of  events  v^ien  the  ratio  of  the  probabilities  was  greater  than  2:1. 

These  findings  suggest  that  either  a series  of  oonparisions 
of  event  pairs  or  a simultaneous  conparison  of  the  event  with  unknown 
prdDability  with  a list  of  events  with  known  probability  may  result  in 
significant  inprovement  in  probabilistic  judgments.  Severed  studies  were 
undertaken  to  evaluate  the  potential  of  this  approach.  In  the  first  set 
of  these  experiments  subjects  were  asked  to  place  the  event  of  interest 
into  a list  of  events  at  a point  appropriated  to  its  relative  likelihood 
of  occurrence.  Incentive  was  given  to  subjects  in  the  form  of  $3.00  for 
each  response  placed  in  the  correct  space  among  thirty  spaces  between  events. 
Results  of  this  experiment  shew  that  subjects  are  not  sufficiently  able  to 
perform  the  task  to  warrant  the  techniques  used  eis  an  elicitation  tool  for 
prcAabilities.  The  mean  aorrelatiofi  over  120  subjects  between  response 
probability  (ixsing  the  mic^int  of  the  response  space)  and  the  true 
probability  (using  the  mi(%»int  of  the  space  in  vAiich  the  event  should  be 
placed)  was  .131. 

Because  the  above  results  mi^t  be  due  to  the  cognitive  difficulty 
of  simultaneoixsly  cenparing  an  event  with  thirty-one  other  events,  a 
second  experiment  asked  subjects  to  cortpare  the  event  of  interest  with 
either  one  or  two  other  events.  A branching  structure  was  used  whereby 
the  subject,  after  making  a series  of  judgments,  WDxdd  arrive  at  an 
estimate  of  the  probability  of  the  response  event.  This  probability  vas 
evaluated  in  the  same  manner  eis  in  the  previous  experiment  using  the 
midpoint  of  the  space  arrived  at  by  the  branching  process.  Results  of 
this  experiment  are  oonparable  with  those  of  the  first  experiment. 
Correlations  are,  on  the  average,  slightly  positive  but  for  no  subject 
were  they  significant  or  large  enough  to  justify  the  technique. 

The  findings  of  these  two  experiments  reiise  a significant  question. 
How  does  one  explain  the  apparent  divergence  of  these  results  from  those 
of  Slovic  et  cd?  The  nature  of  the  events  used  in  the  current  series 
of  experiments  differed  frxxn  those  used  by  Slovic  for  three-fourths  of 
the  subjects;  but  analysis  of  the  one-fouxrth  using  occipation  events. 
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selected  frcm  those  used  by  the  Slovic  group,  shews  the  same  inability  to 
nake  accurate  judgments.  Closer  examination  of  the  task  ejqilains  the 
discrepancy.  While  subjects  in  the  Slovic  et  al.  experiment  produced 
relatively  good  directional  judgments  when  the  odds  ratio  between  events 
was  greater  than  2:1,  event  pairs  with  odds  ratios  less  than  2:1  produced 
very  poor  judgments.  Frequently,  subjects  systenaticcdly  choose  the  less 
probable  event  as  more  likely.  In  the  current  experiments,  subjects 
were  required  to  nake  successively  more  sensitive  judgments  so  that  even 
if  their  initial  pairwise  judgments  were  correct,  later  judgments  of  the 
branching  or  list  placement  tasks  involved  choices  too  sensitive  for 
their  abilities. 

Vfe  have  two  ideas  about  the  usefulness  of  techniques  developed  out 
of  the  work  of  Slovic  et  al.  First,  for  the  skill  of  the  probabilistic 
judge  to  be  effective  in  realizing  improved  estimates  it  seems  likely 
that  aggregaticMi  over  individuals  must  occur  in  seme  form.  Slovic  found 
that  directional  choices  in  the  pairwise  task  were  likely  to  be  correct 
for  a large  proportion  of  subjects,  but  not  all  persons  were  correct  and 
most  odds  ratio  judgments  were  too  conservative.  A sort  of  majority 
rule  principle  in  paired  conparisions  of  probabilities  may  well  yield 
improved  probabilistic  judgments. 

A second  issue  concerns  the  nature  of  expertise.  A serious  question 
for  the  use  of  decision  analysis  is  and  has  been:  Can  experts  make  the 
needed  probabilistic  judgments?  Experimentation  on  probabilistic  bias 
suggests  that  these  judgments  are  of  decidedly  low  quality  although 
there  is  evidence  that  esgjerts  perform  these  tasks  somewhat  better  theui 
college  sophomores,  such  as  we  have  been  using.  A pilot  study  has  been 
performed  which  points  to  some  interesting  possibilities  in  this  area. 

We  defined  expertise  as  experience  or  familiarity  with  the  subject 
matter  whose  relative  likelihood  was  bo  be  judged.  In  the  case  of  the 
initial  experiment  this  was  baseball  statistics  for  the  Los  Angeles 
Dodgers  players.  Subjects  were  asked  ten  questions  of  the  type,  "Is  it 
nore  likely  that  a randomly  selected  Dodger  player  (not  a pitcher)  has 
25  or  more  home  runs  this  season  or  is  it  more  likely  that  a randomly 
selected  Dodger  player  (not  a pitcher)  has  88  or  more  bcises  on  balls 
this  season?"  At  tiie  same  time  subjects  were  asked  ten  questions  about 
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the  relative  likelihood  that  a fjerson  selected  at  random  would  turn  out 
to  be,  for  exanfile,  a lawyer  or  secretary.  Those  questions  about  employ- 
ment were  taken  directly  from  the  Slovic  et  al.  study  so  that  a comparison 
between  subjects  could  be  made. 

The  pattern  of  resjxsnses  for  the  Slovic  et  al.  questions  in  this 
study  was  roughly  the  bork?  as  it  was  in  the  original  study.  Percentages 
of  correct  directional  responses  were  slightly  higher  for  six  questions 
find  lower  for  the  other  four  than  for  Slovic  et  al.  Our  subjects  were 
therefore  ooiqxurable  to  those  in  the  original  study. 

As  a measure  of  expertise  subjects  answered  a series  of  questions 
about  thfjinsolves  conooming  the  number  of  games  they  had  attended,  hew 
ireiny  times  they  road  tlie  box  scores,  etc. , as  well  as  the  number  of 
rostered  Dodger  players  they  could  iiame.  They  also  rated  therniselves  as 
Dodger  fans  on  a 7-point  scale.  Regression  analysis  was  then  done  with 
these  measures  of  baseball  expertise  as  predictors  of  quality  of  j»r- 
formance  on  both  the  Dodger  questions  and  the  Slovic  et  al.  questions. 

Ttie  multiple  regression  results  are  very  similar  for  Dodger 
find  Slovic  et  al.  c^uestions.  The  multiple  regression  coefficients  were 
.55  and  .48  respectively.  That  is,  Dodger  ffins  do  better  on  questions 
about  employment,  as  well  as  about  the  Dodgers,  than  non-Dodger  fans.  The 
similarity  between  these  two  coefficients  suggests  tlie  possibility  that 
a cemmon  factor  underlies  the  ability  to  answer  both  types  of  questions. 
Perhaps  expertise  in  a given  subject  area  is  not  the  important  factor  in 
performance  of  a probabilistic  task.  Maybe  the  ability  to  deal  with  prob- 
abilistic thought,  and  thereby  put  whatever  substantive  knowledc^  is  available 
to  use,  is  what  produces  good  probabilistic  assessments. 

This  though  is  not  original  with  us.  Winkler  (1967)  discusses  two 
font®  of  expertise,  one  in  which  substantive  information  is  brought  to 
the  task  and  a second  in  which  the  subject  understands  the  probabilistic 
task  find  the  concejit  of  uncertainty.  Fiurther  study  of  the  regression 
analysis  results  stxws  similar  patterns  for  the  beta  weights  on  Dodger  and 
Slovic  et  al.  questions.  Therefore,  the  pattern  of  information  used  in  the 
two  different  types  of  questions  is  remarkably  similar.  This  finding  is 
congruent  with  the  two  forms  of  expertise  hypothesis. 
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■n^e  results  of  this  line  of  experinientation  have  raised  many 
more  questions  than  they  have  euiswered.  Study  into  the  nature  of 
ejq»rtise  will  obviously  be  a fruitful  line  of  endeavor  and  should 
precede  further  vgork  on  narker  event  techniques.  It  could  well  be 
that  training  of  experts  in  probabilistic  thinking  would  lead  to 
significant  inprovement  in  the  quantification  of  uncertainty. 


II  A. 5.  Estinating  subjective  probability  distributions. 

Probably  the  most  cited  upublished  work  in  decision  theory, 
that  of  Alpert  and  Raiffa  (1969) , found  that  different  methods  for 
assessing  proability  distributions  resulted  in  different  levels  of 
bias  in  responses,  in  their  case  "too  tight  distributions".  Seaver, 
von  Winterfeldt  and  Edwards  (1975)  went  on  to  show  that  the  amount 
of  bias  as  measured  by  "surprises",  a true  value  falling  outside  a 
specific  central  interval  of  the  assessed  distribution,  was  affected 
in  systematic  ways  by  the  method  used  to  assess  the  probability 
distribution.  These  results  are  consistent  with  many  others  in  decision 
theoretic  reseeirch  in  which  assessment  techniques  have  contributed 
to  the  quality  of  probabilistic  judgment. 

One  problem  with  studies  of  probability  assessment  is  that 
the  available  dependent  measures  eure  not  completely  satisfying  ones. 
Proper  scoring  rules  do  not  provide  a sensitive  measure  of  hew  closely 
an  elicited  probability  distribution  reflects  the  cissessor's  beliefs 
about  the  chances  of  occurrence  of  the  events  over  which  the  distribution 
is  assessed.  Typical  measures  of  calibration  involve  the  proportion  of 
events  which,  historically  cind  across  distributions,  lie  on  fixed  inter- 
vales of  the  assessed  distribution.  Hie  proportion  of  events  obtained 
that  were  assessed  as  lying  within  the  interqueurtile  interval  (.25 
p .75)  is  an  example.  Such  measvires  require  assessments 
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over  huge  nunbers  of  distributions  to  be  reliable.  Furthermore,  many 
interesting  questions  regarding  tlve  usefulness  of  elicitation  techniques 
simply  cannot  be  answered  without  a measure  vrtrich  takes  into  account  a 
larger  nunber  of  the  important  features  which  distinguish  one  probability 
distribution  function  fran  another. 

John  & Edwards  (1977)  investigated  the  possibility  of  presenting  Ss 
with  a sanple  distributioir  of  a random  variable  and  eliciting  the 
population  distribution  (density)  from  whiclr  the  sairple  was  pres\jnably 
drawn.  Stimuli  were  pickup  sticks  (length  =6.5  inches),  painted  blue 
and  yellow.  The  length  of  yellow  on  eaclr  stick  constituted  the  random 
varicible.  Subjects  were  slKwn  three  sample  distributions  (uniform, 
modal,  and  bimodal)  of  twenty-six  sticks  each. 

Each  subject  used  one  of  three  probability  elicitation  procedures 
to  convey  his  (her)  knowledge  of  the  ix>pulation  distribution  from  which 
tJie  sample  was  presimably  drawn.  In  t)>e  fractile  procedure,  Ss  were 
asked  "to  give  a length  of  yellow  such  that  a stick  chosen  randomly  from 
the  population  just  sami-ded  will  )rave  a length  of  yellow  less  than  or 
et^ual  to  the  lervgtir  you  give  witlr  probability  = (.99,  .75,  .50,  .25,  .01)  ". 

In  the  probability  procedure,  Ss  were  asked  "to  judge  what  the  probability 
is  that  a stick  chosen  raixlcmly  from  ti>e  population  just  sampled  has  a 
lengtlr  of  yellow  less  than  or  equal  to  (.65", 1.95",  3.25",  4.55",  5.85")". 
A third  procedure  (graph)  required  Ss  to  draw  a cvurve,  of  which  "the 
height  at  each  point  represents  the  relative  probability  that  a stick 
drawn  at  random  from  the  population  will  l»ve  that  length  of  yellow". 

Ihe  fractile  and  probability  methods  were  used  by  Seaver,  von  Winterfeldt, 
and  Edwards  (1975)  and  essentially  involve  obtaining  estimates  of  five 
points  (ordered  pairs)  along  t)>e  cvmulative  distribution.  Ihe  graph 
techniijue  obtains  a sketch  of  tl>e  S's  density  function. 

For  each  of  the  curves  produced  using  tl>e  Gra^ih  techniijue,  fourteen 
IxDxnts  evenly  spaced  along  the  curve  were  used  as  input  into  a numerical 
integration  algorithm  to  produce  a piecewise  representation  of  the 
assessed  cunulative  distribution,  such  as  thiise  already  obtained  in  the 
probability  and  fractile  techniiiues.  Ihe  dependent  measure  was  taken  to 
be  the  n^intm  deviation  (verticeilly)  betwoen  the  piecewise  representation 
of  each  elicited  distribution  and  the  corresponding  samjde  distribution 
whicli  was  shown. 
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The  <^oodness  of  fit  between  the  elicited  and  sanple  distributions 
was  fomxl  to  be  a non^^dditive  function  of  assessment  technique  and 
sanple  distribution  sivipc.  Although  the  fractile  procedure  performed 
substantially  worse  for  all  throe  sanple  distributions,  the  relative 
perfomance  of  the  probability  and  graph  methods  vzuries  as  a function  of 
sample  distribution.  The  finding  that  biases  in  probability  assessment 
result  from  eui  interaction  between  the  method  of  assessment  and  tiie 
shape  of  the  distribution  is  an  important  one;  the  development  of  em 
experimental  paradigm  to  adequately  evaluate  prxabability  assessments  is 
a topic  vorthy  of  further  attention. 

From  cin  applied  point  of  view,  this  experiment  once  more  suggests 
that  the  custom  of  using  fractile  techniques  for  assessing  continuous 
distributions  ratlier  Uun  any  of  the  equally  sinple  or  simpler  alternatives 
is  probably  unwise  and  in  need  of  clwige.  Moreover,  it  offers  evidence 
for  the  simplest  of  all  possible  altenvjtives : If  you  want  someone  to 
assess  a continuous  probeibility  distribution,  just  ask  Ivim  to  draw  it. 

II.  B.  Multiattribute  utility  c>nalysis;  Validation  and  Application 

Pour  distinct  approaches  to  the  validation  of  inultiat tribute  utilities 
can  be  identified. 

1.  Like  preferences,  utilities  are  inherently  correct  and  do  not 
need  to  be  validated. 

Vte,  as  psycliologists  knowing  that  all  self-reports  can  error, 
consider  this  idea  untenable  and  not  worthy  of  serious  discussion.  For 
that  reason,  the  fact  that  (as  we  see  it)  this  view  dominates  the  decision 
theoreticeil  literature  is  continually  baffling  to  us. 

2.  Utilities  express  strengths  of  preference;  cind  tlierefore  one 
validates  them  by  discovering  whether  or  not  they  correctly  predict 
preference. 

If  preferences  cue  the  ultimate  criterion,  vrtiy  bother  with  utilities? 
Preferences  can  be  observed  directly,  c»nd  the  whole  structure  of  deterministic 
utility  theory  is  then  irrelevant. 

3.  Utilities  are  hypothetical  constructs,  approachable  in  a 
nuvber  of  different  ways.  Oonv’&rgent  validity  is  ^Lll  that  can  or  should 
be  sought.  That  is,  Vcuious  voys  of  eliciting  values  should  lead  to 
intelligibly  related  though  not  necessarily  identical  results. 

This  is  an  intellectually  respectable  view,  on  which  we  have  been 
doing  a lot  of  work,  smmarized  below. 
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4.  CCie  cannot  vcilidate  utilities  thanselves;  one  can  only  validate 
methods  of  neaisuring  them.  Ihis  is  done  by  finding  or  creating  stimuli 
for  vAiich  vadues  are  known,  eliciting  utilities  by  various  methods,  and 
concluding  that  the  method  that  most  closely  approximates  the  "true" 
utility  is  thereby  vcilidated.  (Note  that  approaches  3 and  4,  though 
different,  do  not  conflict.) 

We  remeiin  intellectually  much  stimulated  hy  this  fourth  abroach, 
and  are  beginning  to  find  ways  of  implementing  it.  We  hc^  it  will  be  a 
major  theme  next  year.  We  think  we  may  see  a way  of  combining  it  with 
the  third  via  an  application  of  the  Brunswickian  lens  model  approach. 

We  have  identified  two  situations  (diamonds  and  credit  risks)  in  v4iich 
extemaLIly  specified  and  quite  esqjlicit  multiattribute  utility  structures 
already  exist. 

Convergent  validation  (to  return  to  approach  3)  cissumes  that  a 
necessary  and  sufficient  ocsidition  for  a given  model  or  assessment 
procedure  to  be  valid  is  that  overedl  utilities  which  follow  from  the 
cissessment  agree  (high  Pearson  product-moment  ooirrelation)  with  utilities 
elicited  in  some  other  manner,  i^roaches  within  this  framework,  v4«ther 
"behavioral"  or  "ancilytical"  in  nature,  are  essentially  ccnplex  sensitivity 
analyses  in  the  sense  that  they  study  the  sensitivity  of  the  output 
utilities,  to  such  inputs  as  model  structure,  elicitaticxi  techniques, 
eind  respcxxJent  identity.  Such  studies,  designed  primarily  to  determine 
the  amount  of  cotimon  variance  shared  by  different  modeling  or  elicitation 
procedures,  are  of  tratiendous  practiced  significance.  In  particular, 
the  tradeoff  between  model  precision  and  eeise  of  elicitation  must  be 
addressed  in  almost  ctny  application  of  MftUA. 

II.  B.l.  Model  simplifications;  A review 

Leung  (see  Sumery  No.  5)  provided  a review  of  theoretical  and 
empirical  reseeirch  findings  regarding  the  sensitivity  of  fftUA  to  model 
specification.  The  question  addressed  was  whether  additio^^d  oemplexities 
(sixdi  cis  non-additivity,  uncertainty,  and  differential  weighting)  are 
useful.  Although  a few  of  the  studies  considered  produced  analytic 
solutions  to  the  questions  asked,  most  were  either  Monte  Carlo  simulations 
or  behavioral  studies.  The  criterion  for  intermodel  agreanent,  in 
almost  every  stvriy,  was  the  correlation  between  utilities  outqjut  by  the 
specified  models.  Leung  came  to  the  following  conclusions; 
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1.  /Viditive  models  should  be  used  as  an  apjjroximation  to  more 
oonplicated  stxxjctures,  at  least  for  the  two  attribute  case  (unless 
there  are  good  reasons  to  believe  that  a non-additive  model  is  an  exact 
representation  of  a decision  maker's  attitudes). 

2.  Weights  do  not  matter  for  deterministic  additive  models  when, 
on  the  average,  the  attributes  are  highly  correlated  with  each  other. 

3.  No  conclusions  nvay  be  drawn  regarding  how  well  deterministic 
models  approxin«ted  more  cxanplicated  probabilistic  ones. 

Of  most  interest  in  Leung's  analysis  was  a call  for  a "measure 
of  robustness  other  thaji  the  coefficient  of  correlation".  Although 
he  was  referring  to  studies  involving  probabilistic  models  only,  the 
need  for  dependent  measures  of  fit  is  great  (see  Anderson  and  Shcinteau 
(1977)  for  a discussion  of  tliis  problem) . 


II.  .2._  Moiite  Carlo  simulation; Weigliting. 

In  a study  ccnceming  the  issue  of  differential  vs.  unit  wei<^ting 
for  additive  deterministic  utility  functions,  Newnvui  (see  Surmary  No.  6) 
exploited  the  similarity  between  tlx?  formal  nathematical  structure  of 
the  multiple  regression  model  and  tlie  additive  utility  model  (under 
certainty) . Using  simulation  tedinicTues  similcu:  to  those  described  in 
Newnaii  (1976) , Newtmn  (1977)  considers  two  methods  of  estiirating  beta 
weights  for  regression  models  and  cm^vires  than  (in  terms  of  veuriance 
accounted  for  on  cross  validation)  to  the  unit  weighting  teclmique.  Both 
procedures  for  estinviting  beta  weights,  ordinary  least  squeures,  (OLfi)  and 
ridge  regression  (RIDGK)  (lk3erl  and  Kei-uiard,  1970,  a,b)  proved  superior 
to  unit  weighting  (UNIT)  in  all  cases  save  one.  In  tlus  one  case,  all 
the  true  coefficients  were  ^x)sitive,  not  too  far  apart,  cind  the  sample 
size  was  relatively  anall  (N  = 50).  In  the  overwhelming  majori^of  cases, 
unit  weighting  was  sin\)ly  ixit  appropriate. 

Newiran  also  fouixl  that  tlie  ridge  estinates  outperformed  the  OLS 
estimates  a great  deal  of  tlx?  time,  replicating  several  studies  which 
demonstrate  the  superiority  of  the  biased  RIDGE  procedure  to  the  more 
populcu:  OLS  approach  (Demster,  Schatzoff,  and  Wemutli,  1975;  Hoerl,  Kenrveud, 
and  Baldwin,  1975;  Lawless  and  Wang  1976) . Newman  asserts  that  the  argritent 
in  favor  of  unit  weighting  is  cxnvletely  shattered  when  the  differential 
weicTths  are  estimated  via  RIDGE.  By  ix*placinq  tlx?  independent  auxl  criterion 
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vztriables  in  the  regression  model  with  the  attributes  and  overall  utility 
construct  of  a multiattribute  utility  model,  one  may  etsk  the  follcwing 
question:  Nhat  subjective  estimation  procedure  do  people  use  in  determining 
their  weights  for  attributes  in  a MAUA?  Ibe  einswer  to  this  question  is 
critical.  Unit  weighting  of  attributes  in  a decision  analysis  is  not 
appropriate  if  the  decision  maker  can  estimate  the  attribute  wei^ts  in 
a RIDGE  or  even  QLS  nanner.  However,  if  subjective  estimates  of  wei^ts 
eire  considerably  suboptimal,  unit  weit^tfitinq  is  a boon  for  the  eQ:plicaticMi 
of  MAUA.  A study  to  cuiswer  this  question,  involving  a multiple  cue  prob- 
ability learning  task,  is  new  being  planned. 

II.  B.3.  Mcnte  Carlo  siimlation:  Nvirber  of  attributes. 

Leung  (see  Sxrniary  No.  7)  described  a study  to  explore  the 
possibility  of  reducing  the  nuniaer  of  attributes  specified  in  additive 
MAU  models  under  certainty.  For  each  exanple,  he  r£mdanly  generated  a 
utility  ^u:Tay  (altematiw  by  attribute)  and  a set  of  wei^ts  for  the 
attributes.  Leung  systematically  varied  the  nunfcer  of  attributes  in  the 
full  model,  the  average  intercorrelatian  among  the  attributes,  the  number 
of  attributes  in  the  reduced  model,  and  the  method  for  deciding  which 
attributes  to  eliminate.  Leung  investigated  the  follcwing  ad  hoc  procedures 
for  reducing  the  nurber  of  attributes: 

1.  Retain  the  highest  weight  attribute,  drop  the  attribute 
that  correlates  highest  with  it;  repeat  until  the  desired  number  of 
attributes  cure  dropped. 

2.  Ignore  intercorrelations,  simply  drop  the  desired  nvnber  of 
attributes  with  the  lowest  weights. 

3.  Discard  the  lowest  weight  attribute:  retain  the  attribute  that 
correlates  hi^iest  with  it;  repeat  until  the  desired  nunber  of  attributes  eu?e 
dropped. 

4.  Pick  the  most  hi^Uy  correlated  pair  of  attributes;  discard  tie 
lower  wei^t  attribute  of  the  two;  repeat  until  the  desired  nunber  of 
attributes  are  dropped. 

Using  as  the  dependent  measure  the  distribution  of  oorrelations 
(N  = 1000)  between  the  full  and  reduced  model,  Leung  found  that  methods  2 
and  3 (desenribed  ^lbove)  oonpletely  dominated  methods  1 and  4.  He  ocxtcluded 
that  method  2,  considering  its  ease  of  application,  was  the  superior  pro- 
cedure. That  is,  unimportant  attributes  (attributes  whidi  receive  small 
weights)  nay  be  eliminated  from  cxnsideration  with  little  loss.  Leung  eipplies 


this  technique  to  tvo  real  world  exaitples  with  good  results. 

The  application  of  tl>e  results  of  this  study  are  subject  to  the 
same  behavioral  questions  posed  by  the  Newman  study  discussed  previously.  i 

In  order  bo  perform  Leung's  method  2,  subjects  mist  be  able  to  accurately 
reink  oirder  attributes  in  terms  of  inv»rtance.  The  findings  of  the  proposed 
multiple  cue  probability  leciming  study  described  in  II.  B.2  would  be 

of  obvious  inportance  here.  i 


II.  B.4.  Behavioral  Validation;  A construct  rather  than  convergent  approach. 

In  his  doctoral  dissertation  F.ils  (1977)  investigated  the  use  of  an 
external  criterion  against  vdiich  to  validate  additive  utility  assessments 
under  certainty.  Eils  elicited  utility  assessments  from  tventy-four  groups, 
each  of  which  consisted  of  four  graduate  or  upper  division  undergraduate 
students  who  knew  each  other  prior  to  the  experimental  session.  Groip 
utilities  were  elicited  (via  consensus)  for  ten  hypothetical  applicants 
for  bank  credit  Ccurds.  The  research  design  corpletely  crossed  two  factors 
in  assessing  group  utilities:  1)  using  a deoonpositicn  procedure  (MRUA) 
or  not  and  2)  using  a fomal  group  comnunicat ion  strategy  (GCS)  or  not. 

The  quality  of  each  group's  utility  judgments  was  defined  to  be  the  Pearson 
product-movement  correlation  between  the  group's  judged  utilities  and 
utilities  output  from  a configural  (nonlinear)  model  used  by  Security  Pacific 
Bank  in  evaluating  applicants  for  Master  Charge.  A content  ainalysis 
of  the  group's  verbal  interaction  was  n^de  to  determine  the  effects  of  task 
structure  on  the  characteristics  of  the  group  process.  Group  satisfaction 
measures  were  also  obtained. 

Eils  found  that  the  decision  technology  of  MAUA  greatly  aided  groups 
in  reaching  decisicars  that  were  more  consistent  (had  higher  oorrelaticsis) 
with  decisions  based  on  a systatatic  collection  and  inter^iretation  of  a 
large  amount  of  relevcint  data  (i.e.  the  bank  model) . When  unit  wei<^ts 
were  used  in  place  of  the  elicited  differential  weic^ts,  the  MAUA  groups 
evidenced  even  hicher  correlations  with  the  bank  model.  The  application 
of  a ccmnunication  strategy  did  not  significantly  alter  tlie  quality  of 
group  evaluations. 

Both  task  interventions  (MAUA  and  OCS)  significantly  influenced  the 
group  comnunicat  ion  process.  In  cKJdition,  groups  employing  the  MAUA  did 
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not  find  the  task  any  more  coirj3lex  or  difficult,  or  any  loss  s^itisfyiixj 
thaii  groups  not  ernployinq  the  technii]ue.  Gnxips  tsivloying  OCS  did  not 
find  their  task  ciny  less  satisfying  or  convlex.  rerliaj-is  for  the  first 
tine,  dcKXwposed  iudcjonents  have  been  slx^gi  to  exhibit  a greater  degree 
of  fit  to  an  exterruil  criterion  than  vdx^listic  iudgnients.  Ihe  formalized 
bank  model  used  to  measure  iuignr'ntal  validity  reflects  fix'  complex 
nature  of  the  I'elationship  between  applicant  characteristics  and  subsequent 
loan  perforTOMxre.  These  corplex  relationships  should  be  similar  to  tlx* 
ones  inherent  in  the  iiiformation  tKat  the  gTx-)ups  bring  to  the  assessment 
task  in  the  form  of  past  exjierie-nce . Thus,  the  dei^uve  to  which  group 
decisions  correspond  to  the  brink's  systematic  cu>d  complex  evaluation  pro- 
vides a measure  of  hew  well  the  elicitation  tex^hnit-iue  taps  the  information 
actially  contained  in  group  members'  ^vist  expt^rience.  Eils  argues  tlvat  tlv? 
MAUA  procedure  he  envlo^’ed  proved  moiv  valid  in  that  a more  con^^lete  re- 
presentation of  each  individual's  v^ast  experience  was  elicited. 

II.  B.5.  Behavioral  Validation;  Assessment  ptxxredures,  mxiel  sti-uctmv, 
separc>bility  of  attributes,  euxl  gains  vs^-_ • 

Eustace  and  Edwards  have  desigixxi  a stvxly  to  s^-stemotically  v'ar^’ 
three  factors  relating  to  assessnciit  aixl  modt'ling  teclmiviix'  vontl  two  factors 
which  describe  the  nature  of  tlx?  multiattributtxl  entities  to  be  evaluated. 

In  a oan^letely  "within"  tlesign,  Eustace  .uxl  lilwaixls  elicii  two  attribute 
utility  functions  (eitl>er  additivx'  or  ntiltiplicatiw)  using  three  v’C'pvilvor 
elicitation  techni».iues  (BRTS,  Rating  Scales,  aixl  Certainty  El^ivalei\ts) 
and  wholistic  choices.  Another  factor,  ix'sttxi  withing  eiicitatiori  technuiue, 
is  that  of  risky  vs.  riskless  utility  functions.  Functitons  an*  elicited  for 
gains  and  losses  from  starting  positions  in  oonmxiitv  Ixuxlles  wtiich  are 
either  separable  (amovints  of  tea  and  ice  cream)  or  inse|virable  (amount  of 
leanness  of  ground  beef).  All  assessmei^ts  are  nade  twice  .oixl,  with  the 
exception  of  the  wholistic  assessnents , no  real  transactions  oexxo*  betvxvn 
subject  and  exi^erimentei'. 

The  results  from  this  stud\’  will  address  tlx*  follc'wing  ^.-jviestions: 

1.  How  well  do  sinple  utility  ittxiels  (riskless,  additi\’e) 
approxiiiBte  more  oomplex  ones  (risky,  multiplicatiw)  ? 

2.  How  well  do  sinple  elicitation  procedures  (rating  scale 


17 


holistic  choices)  a{3proximate  more  complex  ojies  (BRL'IS, 
Certainty  Equivalents)? 


3.  \^t  is  the  relationship  between  gains  and  losses  in 
starting  position  when  the  final  utility  attained  is 
held  in  constant? 

4.  How  cire  the  answers  to  questions  1 through  3 mediated 

by  the  sepeurability  of  attributes  in  the  oommodity  bundles? 

II.  B.6.  MAUA  and  Systems  Dynamics. 

Gardiner  and  Ford  (1977)  explain  a technique  for  using  additive, 
riskless,  maltiattribute  utility  functions  to  evaluate  the  results. 

Cdiputer  simulation  models  are  frequently  developed  and  used  as  policy 
analysis  tools  that  show,  for  the  system  laeing  modeled,  hew  its  behavior 
over  time  is  influenced  by  proposed  policies.  Many  sinulation  efforts  stop 
at  this  point  and  leave  the  synthesis  of  the  derived  results  to  unaided 
intuitive  approaches.  The  emphasis  and  focus  is  on  developing  models  that 
show  consequences  of  policies,  not  on  the  formal  evaluation  of  these  con- 
setiuences.  As  a result,  simulation  models  and  acconpanying  policy  re- 
cenmendations  are  frequently  criticized  for  failing  to  take  into  account 
societal  interests  ^uad  values.  This  paper  discussed  hew  MAUA  can  be  applied 
to  the  output  of  cenputer  simulations  to  ronedy  the  deficiencies  inherent 
in  the  system  dynamics  methodology. 


The  paper  discusses  an  application  of  the  technique  in  energy  boom 
towns  where  a system  dynamics  model  of  a boom  tewn  "feeds"  evaluation  models 
developed  from  nine  viewpoints  of  individuals  (including  those  of  the  mayor, 
a conservationist,  representatives  of  the  energy  industry,  etc.)  in  Framington, 
New  Mexico.  The  applicability  of  this  technique  merger  to  military  boom  town 
phenomena  is  discussed  as  well  as  its  application  to  military  "bust  tewns  ; 
i.e.  those  instances  where  U.S.  military  installations  are  closed. 
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III.  Applications  of  Dtxrision  'I\x:hnoloq^’ 


! ? 


The  follwinq  section  sunnvuizos  on  ARPA  'l\.K''linical  Refxart  by  IaAvoixIs 
(1977).  Mach  of  th.it  ix'port  is  ccncemed  witli  tlie  problaiis  and  prostx'cts 
for  institutionalizin'.-!  decision  analytic  tecliniqucs  in  Federal  bm'eaucratic 
conte.xts,  anti  is  in  {.virticul.ir  rt's^xuisiw  to  tiie  views  on  tiiat  topic  of  .‘•Ir. 

Joseph  Coates,  of  tiie  U.S.  Office  of  TV.'chno logs’  Assessment.  As  examj-'les, 

BAvards  suntmrizes  tsvo  extensiw  applications,  both  using  ARPA-dew loped 
tcci-uiology  but  neither  fiu'K.kxi  by  A1\PA.  ISotli  studies  include  tecl-uiical  inno’  - 
vations  highly  relevant  to  ARPA  .uxi  000  ntxxls  and  prcbltims.  Wx'  Technical 
Re!\:>rt  nnkes  sfx-cial  effort  to  be  readily  understanciible  by  those  unaa.]uainted 
witli  decision  technology’,  prcixibility , aixl  tlie*  like;  it  does,  howtiwr,  assunie 
experience  with  Federal  bureaucracies. 

Tlie  first  exanple  outlined  by  Iiiv.irds  ins’olws  tiie  teclmology  of  proba- 
bility assessment  and  use  of  Bayes'  tJieoriTi.  Prolxibilitv  of  vai'ious  diag- 
noses were  assessed  by  clinicians  in  enx'rgencx’  roan  settinqs  all  owr  the  U.S. 
to  determine  the  diagnc)stic  efficacy’  of  tlie  radiographic  procotiures  enployed. 
Specifically,  clinicians  providtxl  prcixibility  diagnoses  before  .ind  after  inter- 
pretation of  about  8,000  x-ray ’s.  Tlie  log  likelihocxi  ratio,  ocnputed  fixtn  tlie 
prior  and  posterior  probabilities  assessed,  seri’ed  as  a nvasure  of  tlie  influ- 
ence of  x-ra\’s  on  clinical  diagnosis.  The  assessiixints  wvre  accotplisheti  "in 
the  field"  by  clinicians  witli  a mininum  of  teclmical  training. 

The  cx.Trple  suggests  extensions  of  tlie  descrilxxi  metlu-xiology’  to  a variety’ 
of  real  israrld  settings.  Any  situation  in  whicli  a costly,  yx'rhaps  dangerous, 
procedure  to  gatlier  information  is  vr.’plcyi.xi  is  anx^ivable  to  tliis  invvstigatory’ 
approach.  As  technology’  in  general  advances  and  metlicxls  to  nxiuce  uncertainty 
becone  increasingly  more  available,  the  decision  of  whetlier  the  .intxuit  of  .ad- 
ditional information  obtained  is  woi'tli  the  energy’  exjx-'ndtxl  in  q.ithering  it  will  be- 
come both  nore  ijii|X)rtant  and  more  cor\'>lex.As  teclinologic.il  so^-'histic.iticMi  in- 
creases, the  stakes  increase  .and  tlie  intuitiw  ability’  of  nvan  to  choose  Ix'ne- 
fici.ally  tx^tvvxx'n  sexikiiig  or  not  scx’icina  nnre  infomvation  dtx-’reases.  Tlius,  tech- 
niques of  studying  the  efficacy'  (in  scrx>  sense)  of  infomvatieai  collecting  pro- 
cedures (such  as  ratiioqraphy)  will  beocm-'  incre.asingly  iir^x'rt.int . Hx'  most 
obvious  milit.ary’  ex.anple  h.as  to  do  with  collectitin  of  intellit'x-'ncv-'  infomvation. 
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In  Eciwards'  second  exaitple,  the  technique  of  multiattribute 
utility  analysis  (MAUA)  is  applied  to  a highly  ccrplex  social  decision 
naking  problem;  siting  a nuclecu:  waste  disposal  facility.  In  contrast 
to  the  first  example,  the  prinary  focus  is  on  determining  measures  of 
value,  not  uncertainty.  The  most  inportant  feature  of  this  application 
is  the  use  of  the  MAUA  procedure  (developed  for  use  by  individuals) 
by  a face-to-face  group  of  decision  nakers. 

Group  interacticxis  were  structured  around  the  i^UA  tasks  of 
determining  dimensicxis  of  inportanoe,  and  weighting  those  dimensions. 
Experts  in  niK:lecir  engineering  from  several  countries  conprised  the 
grorps.  HypothdLcal  alternative  waste  disposal  sites  were  generated  by 
one  of  the  e^qserts  vho  had  extensive  experience  with  the  siting  problem. 

A numerical  demonstration  of  MAUA  evaluation  of  sites  was  performed,  using 
the  weights  cissessed  from  the  experts  and  linear  transformations  of  values 
(or  log  values)  as  location  measures  on  utility  curves. 

TWO  additions  to  the  usual  MAUA  technique  were  enployed.  Rather  than 
chtaining  ratio  scaled  weights  in  which  only  ratios  involving  the  least 
inportant  attribute  cure  checked,  the  respondents  were  required  to  judge 
ratios  of  all  possible  pairs  of  wei^ts.  This  change  in  elicitation 
procedure  probably  enhanced  the  reliability,  and  hence  validity,  of  the 
utlity  model  parameter  estimates  determined  via  the  weighting  procedure. 

Another  unique  aspect  of  this  endeavor  is  the  rescaling  of  weights 
to  reflect  the  range  of  values  on  each  dimension  for  which  alternatives 
are  actually  avaihible.  Since  wei^ts  trust  often  be  obtained  using  a best 
guess  of  the  ranges  on  each  dimension  tdiich  the  alternatives  will  span, 
this  rescaling  procedure  is  a potentially  valuable  one.  Further  research 
exploring  the  accuracy  of  the  assuitptions  inplicit  in  this  technique  is 
still  necessary,  hcwsver. 
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V.  Sumnaries  of  Technical  Reports 


Suitmary  No.  1 

How  Groups  can  Assess  Uncertainty:  Human  Interaction 
Versus  Mathenatical  Models 

David  A.  Seaver 

Recently  developed  decision  aiding  technologies  rely  upon  quantification 
of  uncertainty  as  subjective  probability.  Since  groups  are  often  responsible 
for  making  decisions,  procedures  for  assessing  the  subjective  probabilities  of 
groups  are  necessary  if  decision  analytic  techniques  are  to  be  generally  appli- 
cable. Two  general  approaches  to  this  prcblem  ejtist:  mathematical  aggregation, 
in  which  individual  probabilities  are  oorbined  via  seme  mathematical  rule  to 
frem  a single  prcbcU^ilit^'  assessment,  cuxi  behavioral  interaction,  in  which  the 
group  members  ccmnunicate  verbally  or  otherwise  to  reduce  or 

eliminate  disagreement.  Several  metiiods  in  each  of  these  categories  cu?e  re- 
viewed. Since  previous  results  cerparing  various  procedures  for  determining 
group  probabilities  are  equivocal,  a study  was  undertaken  to  oerpare  several 
mathematical  aggregation  and  behavioral  interaction  approadies.  The  results 
of  this  study  suggested  that  seme  interaction  tends  to  increase  the  certainty 
of  the  group,  decrease  the  calibration,  and  decrease  the  disagreement  among 
group  members,  although  the  type  of  interaction  makes  little  difference.  The 
mathenatical  aggregation  rule  used  affects  both  the  calibration  and  the  certain- 
ty of  the  group.  Choice  of  just  which  procedure  to  use  depends  on  a tradeoff 
between  the  desirability  of  increased  certainty  £ind  calibration.  In  many  in- 
stances, sinple  averaging  of  individual  assessments  without  any  group  inter- 
action may  be  the  most  desirable  procedure  simply  because  it  is  the  eeisiest  to 
use. 
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Sunnary  No.  2 


j 

The  Effects  of  Response  Scales  on  Likelihood  Ratio  Judgments 

I 

William  G.  Stillvgell,  EJavid  A.  Seaver,  and  Ward  Edwards 

Different  methods  of  eliciting  responses  to  the  same  question  often  pro- 
duce different  respcxises.  In  order  to  systematically  study  how  response 
scales  affect  likelihood  ratio  judgments,  two  experiments  were  ccnducted.  Ex- 
periment I manipulated  two  independent  variables:  the  endpoints  of  the  re- 
sponse scales  (100:1,  1000:1,  10,000:1)  and  the  spacing  of  the  scales  (loga- 
rithmic versus  lineeu:) . Results  conpared  the  veridicality  of  responses  on  the 
six  scales  produced  by  crossing  these  factors  plru:  another  response  mode  in 
which  subjects  simply  wrote  their  judgment  in  a blank  (no  scale) . 

Logarithmic  scales  produced  responses  that  were  both  more  veridical  and 
more  consistent  than  responses  on  linear  scales  which  were,  in  turn,  better 
than  sinple  written  responses.  Measures  of  the  effects  of  the  eni%)oints  were 
scmeidTat  inconsistent  and  probably  interacted  with  the  range  of  veridical  like- 
lihood ratios.  Judgments  of  relatively  small  likelihood  ratios  were  affected 
by  the  spacing:  linear  spacing  caused  overestimaticn.  Judgments  of  relative- 
ly large  likelihood  ratios  were  controlled  more  by  the  endpoints:  higher  end- 
points produced  larger  judgments.  Apparently,  subjects  use  the  range  of  the 
scale  as  information  about  the  riinge  of  true  likelihood  ratios. 

Experiment  II  manipulated  two  additional  variables,  data  diagnosticity 
and  the  values  of  the  true  likelihood  ratios.  The  results  of  Experiment  I 
were  confirmed  while  neither  of  the  additional  veuriables  radically  changed  the 
effects  of  endpoints  cr  spacing. 
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Sumary  No.  3 

Dew’  '«  Ing  the  'Dechnology  of  Probablistic  Inference 
Aggregating  by  Averaging  Redvices  Conservatism 


Lee  C.  £ils,  III,  David  A.  Seaver,  cind  Ward  Edvards 

A relatively  large  body  of  research  indicates  that  pec^le  are 
ccaiservative  processors  of  probabilistic  infomation.  Recent  attention 
has  focused  on  tvo  possible  explanations  of  this  piienanenan.  The  mis- 
aggregation  hypothesis  depicts  oonvservatism  as  an  inability  to  prc^jerly 
oontoine  the  infomation  in  a data  sequence.  The  other  e^qslanation  suggests 
conservatism  is  the  result  of  a response  bias:  the  avoidance  of  extreme 
odds  or  probability  judgments. 

IVo  esqsriments  explored  the  use  of  a specific  response,  average 
certainty,  that  was  devised  to  thwart  conservatism  caused  by  either 
response  bieis  or  misaggre^tion.  Use  of  appropriate  instructions  cind 
response  scales  made  the  average  certainty  judgments  good  subjective 
assessments  of  the  arithmetic  mean  likelihood  ratio  which  could  then  be  used 
in  the  appropriate  form  of  Bayes'  Iheorem  to  calculate  posterior  odds.  These 
judgments  seemed  unlikely  to  be  affected  by  a response  bias  since  extreme 
responses  vere  not  needed.  In  addition,  research  has  suggested  that  pet^le 
cure  more  likely  to  aggregate  infcantation  by  averaging  than  by  adding  or 
multiplying,  so  misaggregation  may  be  exhibited  only  in  specific  forms  of 
aggregation  and  may  not  be  present  in  averaging. 

The  results  of  Experiment  I indicated  that  average  certainty  judgments 
were  both  more  orderly  and  more  veridical  than  cumulative  oertednty  judgments 
of  the  type  usually  obtained  in  probabilistic  inference  tasks.  The  cumulative 
judgments  were  very  conservative  while  the  average  certainty  judgments  were 
only  sli^tly  radical.  Experiment  II  indicated  that  average  certainty  judg- 
, ments  cind  individual  likelihood  ratio  judgnents  were  both  more  orderly  and 

[ veridical  than  cumulative  certainty  judgments  but  that  they  did  not  differ 

[ significantly  from  each  other  in  either  orderliness  or  veridicality.  A 

I 

; second  faujtor,  the  diagnosticity  level  of  the  data  vas  also  found  to 

I influence  the  veridicality  of  obtained  judgments.  Regardless  of  the  method 
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of  aggregation  enplpyed,  estimates  became  more  veridiceil  as  the  data 
became  more  diagnostic.  Since  thse  studies  were  undertaken  only  to 
see  if  average  certainty  judgments  are  an  effective  way  to  reduce 
conservatism,  they  do  not  directly  test  vAiat  causes  conservatism. 
However,  seme  inplications  concerning  the  nature  of  conservatism  axe 
discussed,  as  are  the  inplications  for  the  technology  of  probabilistic 
inference. 
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Sumary  No.  4 

Subjective  Pipcdsability  Assessment  and  the 
Shape  of  the  Distributicn 

Ric^iauixl  S.  John  au>d  Nard  Edwards 

Seventy-twD  subjects  were  presented  with  three  san|3les  of  pickup  sticks, 
each  painted  yellcw  and  blue.  After  viewing  each  sanple  distributicn, 
subjects  assessed  subjective  probability  distributions  over  the  "length 
of  yellow"  painted  on  sticks  in  each  population  sampled.  Each  subject 
utilized  one  of  three  popular  probability  assessment  techniques  in 
naking  the  xjncertainty  judgments.  The  shape  of  the  distribution  of 
lengths  of  yellow  was  found  to  interact  with  eissessnent  technique, 
suggesting  that  biases  introduced  in  subjective  probability  distributions 
vary  as  a function  of  the  uncertain  quantity  being  cissessed.  The  custonary 
"fractile"  procedure  for  assessing  continuous  probability  distributions 
oonsistenty  yielded  the  worst  fitting  subjecrtive  assessments. 


1 
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f 
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Sumary  No.  5 

Sensitivity  Analysis  of  the  Effect 
of  Vauriation  in  the  Form  and  Paraneters  of 
a Multiattribute  Utility  Function:  A Survey 

Patrick  Leung 

There  is  a trend  tabards  the  development  of  ocnplicated  versions 
of  rnultiattribute  utility  models.  Ihese  models,  although  theoreticedly 
more  accurate  in  the  representation  of  decision  nakers'  attitudes,  require 
^lssessment  procedures  v)hich  cu?e  more  difficult  and  time  consuming  to  imple- 
ment than  simpler  models.  The  paper  reviews  theoretical  and  empirical  re- 
se2uxdi  on  the  sensitivity  of  inultiattribute  utility  models  with  emphasis 
on  simplification.  Both  deterministic  and  probabilistic  models  cure  considered 
and  the  studies  cure  divided  into  four  areas:  1)  those  involving  sensitivity 
to  the  form  of  the  multiattribute  utility  function;  2)  those  involving 
saisitivity  to  the  parameters  of  the  functicai;  3)  those  involving  sensiti- 
vity to  the  form  of  individual  single  attribute  utility  functions;  and  4) 
those  involving  the  relatio.'>‘  hip  between  deterministic  and  prchabilistic 
models. 
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Summary  No.  6 


Differential  Weighting  for  Prediction  and  Decision  Making  Studies 
A Study  of  Ridge  Regression 

J.  Robert  Newman 


This  pc^ier  is  another  in  a series  exploring  the  conditions  under  which 
either  differential  or  simple  unit  weighting  of  predictor  variables  in  predic- 
tion and/or  decisicai  studies  will  be  appropriate.  Some  of  the  difficulties  of 
applying  the  ordinary  least  squares  (0I£)  analysis  to  practical  prcbloiis  are 
described  and  an  alternative  reg:  ..ssion  model  called  ridge  analysis  (RIDGE)  is 
offered  as  a substitute  to  OLS.  Ihe  trouble  with  OLS  is  that  when  the  predic- 
tor variables  are  intercorrelated,  then  the  regression  coefficients  estinated 
by  OLS  are  often  quite  deviant  from  the  "true"  coefficients.  They  are  often 
too  large  in  absolute  value  and  the  sign  of  the  coefficient  can  be  wrong.  Ihe 
RIDGE  solution  to  this  is  very  siitple:  just  add  small  positive  values  to  the 
nain  diagonal  of  the  correlation  mtrix  depicting  the  intercorrelations  bet\'K?en 
the  predictor  variables,  and  re-estimate  the  coefficients  in  the  usual  manner. 
The  resulting  estinates  are  called  RIDGE  estimates  and  in  theory  they  will  be 
superior  to  OLS  estixrates  in  the  sense  of  producing  smaller  error  in  cross  val- 
idation sanples.  That  is,  when  015  and  RIDGE  estimates  are  estinrated  in  one 
sample  of  data,  and  then  tested  on  a new  sanple  of  data,  the  RIDGE  estimates 
will  result  in  fewer  errors  of  prediction  than  the  OLS  estinates. 

Several  empirical  studies  were  conducted  using  ccrputer  sinulated  data  for 
Vcurious  prediction  situations.  The  0I£  and  RIDGE  models  were  ccmpared  as  to 
their  efficacy  in  prediction  and  both  models  were  oompared  against  the  sijn^^lest 
model  possible,  that  of  unit  wei^ting  (UNIT) , in  vdiich  no  weighting  is  perform- 
ed; the  Vcuriables  are  simply  added  up  cind  the  sum  used  for  prediction.  The 
results  of  these  studies  indicate  that  0I£  and  RIDGE,  with  one  exception,  always 
outperformed  UNIT  with  respect  to  producing  snalltu:  errors  of  prediction  and, 
vdiat  is  more  inportant,  RIDGE  always  did  bettor  than  Ctti?.  The  one  exception 
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in  vrtiich  UNIT  did  better  than  0I£  and  RIDGE  is  for  the  ceise  in  which  all  the 
"tnie"  coefficients  are  positive,  not  too  feir  apart,  and  the  sample  size  is 
relatively  small  (<50).  Hiis  is  a very  restricted  class  of  conditions.  Hie 
general  conclusion  is  that  UNIT  wei^ting  will  be  preferred  as  a way  of  gen- 
erating differential  wei^ts.  Also  the  RIDGE  method  of  estimation  (RIDGE) 
cilways  should  be  the  preferred  model  over  OLS.  One  practical  inplication  of 
this  is  that  if  an  investigator  does  not  have  the  luxury  to  do  cross  valida- 
tion then  RIDGE  estimation  can  be  used  as  a si±)Stitute  for  cross  vcilidation. 
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Summary  No.  7 

live  Effects  of  Reducing  the  Number  of  Attributes  in 
Additive  Multiattribute  Utility  Modeling  tftider  Certainty 

Patrick  Leung 

This  p^3er  explores  the  effects  of  reducing  the  number  of  attributes  in 
multiattribute  utility  modeling  under  certainty.  Four  different  schemes  for 
reducing  the  number  of  attributes  are  bested,  using  Monte  Carlo  Simulation. 

A simple  method  which  ignores  interoorrelations  among  attributes  and  takes 
only  the  weights  into  account  is  found  to  yield  a reduced  model  v*iose  correl- 
ation with  the  origiricil  full  model  is  hipest.  This  method  is  applied  to  two 
real  world  examples — an  automobile  evaluation  problem  and  a coastal  development 
site  selection  problem"  and  yields  good  results  in  both  cases.  Hie  amount  of 
time  and  effort  saved  through  the  use  of  the  reduced  model  instead  of  the  full 
model  is  found  to  be  considerable. 
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probability  judgments  to  form  a group  judgment  did  not  differ  significantly 
from  behavioral  interaction  in  final  quality  of  the  judgments  as  evaluated  ^ 
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by  a quadratic  scoring  rule.  Other  experimental  work  indicated  that  elicitation! 
techniques  were  of  significant  importance  to  the  quality  of  judgments, 
response  scales  were  found  to  affect  both  the  magnitude  and  veridicality  of 
probabilistic  judgment.  In  the  assessment  of  subjective  probability 
distributions  elicitation  technique  was  found  to  interact  with  the  type  of 
distribution  used  to  generate  the  data  in  that  biases  introduced  in  subjective 
probability  distributions  varied  as  a function  of  the  uncertain  quantity 
being  assessed.  , 


Simplificai^p  techniques  for  the  assessment  of  multiattributed  utilities 
were  investigated  and  it  was  found  that  several  methods  for  the  selection 
of  subsets  of  the  total  number  of  attributes  lead  to  remarkably  robust  results. 
Ridge  regression  was  tested  as  an  alternative  to  the  standard  least  squares 
method  of  estimating  weights  in  the  assessment  of  utility  under  certainty  and 
found  to  outperform  the  least  squares  procedure  a great  deal  of  the  time. 

The  marriage  of  multi  attribute  utility  assessment  (MAUA)  and  systems  dynamics 
was  undertaken  with  the  MAUA  used  as  evaluation  for  the  outputs  of  the  dynamic 
model . ' 


A practical  discussion  of  the  application  of  decision  technology  examined 
its  useful  1 ness  in  Department  of  Defense  contexts.  Problems  with  the 
institutionalizing  of  decision  analysis  techniques  in  Federal  bureaucratic  con- 
texts were  discussed  as  well  as  two  applications  illustrating  procedures. 
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